Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(db): fix number of files in db, startup hang, ram issues and flushing issues #379

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

cchudant
Copy link
Member

@cchudant cchudant commented Nov 6, 2024

Pull Request type

  • Bugfix

What is the current behavior?

Pragma devnet has >1M sst files. This is because compaction couldn't keep up with the atomic flushes we do after every block. This resulted in db startup taking an absurd amount of time (sometimes tens of minutes, scanning most of those >1M files). In addition, the flushes were also taking a lot of time (~10s!).

What is the new behavior?

I could replicate the issue by copying the pragma db locally, but I could also replicate the issue by syncing a node with a modified version of block-import that force-flushes every block (this mimicks the behavior of the sequencer which does the same) and watching the number of files in the db. This was much easier.

The tiered compaction is not adapted for our needs, so I switched the db to universal compaction. I managed to sync 100k blocks with only a hundred sst files instead of around 30 thousand files in the db with this compaction mode. Hopefully, this should fixes all of these issues + the RAM taken by rocksdb should now be bounded.

Does this introduce a breaking change?

Theorically, no - but i think you should upgrade your sequencer db by syncing a full-node on it, and replacing the sequencer's db with the full-node's to benefit of all of the new db options. Using an old db will work, but it won't try to update the old sst files very much I think.

Copy link
Member

@antiyro antiyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if you could just give some comments on the current used parameters

options.set_max_subcompactions(cores as _);

options.set_max_log_file_size(10 * MiB);
options.set_max_open_files(2048);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment describing each of those numbers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or at least a doc for the function explaining how it addresses the issue this PR is solving.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the max open files was added in #367
i'll add info to the options, i may also solve #231 while i'm at it if it doesnt take me too long

options.set_compression_type(DBCompressionType::Zstd);
match self {
Column::BlockNToBlockInfo | Column::BlockNToBlockInner => {
options.optimize_universal_style_compaction(1 * GiB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here why 1GB?

Copy link
Member Author

@cchudant cchudant Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't access to the column sizes metrics as of now, so I eyeballed it
we can tune all of this later, the basic idea is that this arg is the RAM budget of the column - i like to think that a normal machine should run madara in 4Go RAM max, and base these tuning params around that

I have not run any test to see if these numbers are optimal they just looked okay to me.

crates/client/db/src/rocksdb_options.rs Show resolved Hide resolved
options.set_max_subcompactions(cores as _);

options.set_max_log_file_size(10 * MiB);
options.set_max_open_files(2048);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or at least a doc for the function explaining how it addresses the issue this PR is solving.

@cchudant cchudant self-assigned this Nov 19, 2024
@shamsasari
Copy link
Contributor

Is this good to merge? I will need to deal with the merge conflicts in #388.

@antiyro antiyro merged commit 018b6cc into main Nov 20, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants